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Abstract 

The estimation of mutation rates and relative fitnesses in fluctuation analysis is based on the unrealistic hypothesis that the 
single-cell times to division are exponentially distributed. Using the classical Luria-Delbruck distribution outside its 
modelling hypotheses induces an important bias on the estimation of the relative fitness. The model is extended here to 
any division time distribution. Mutant counts follow a generalization of the Luria-Delbruck distribution, which depends on 
the mean number of mutations, the relative fitness of normal cells compared to mutants, and the division time distribution 
of mutant cells. Empirical probability generating function techniques yield precise estimates both of the mean number of 
mutations and the relative fitness of normal cells compared to mutants. In the case where no information is available on the 
division time distribution, it is shown that the estimation procedure using constant division times yields more reliable 
results. Numerical results both on observed and simulated data are reported. 
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Introduction 

The estimation of mutation parameters in cell growth exper- 
iments, or fluctuation analysis, has been the object of many studies 
since its introduction by Luria and Delbruck in 1943 [1]: see 
reviews by Stewart et al. [2], Angerer [3], and Foster [4]. 
Fluctuation analysis is based on the Luria-Delbruck distribution, 
derived under different assumptions by Lea and Coulson [5] and 
Bartlett (in the discussion following Armitage [6]). Mandelbrot [7], 
then Bartlett [8] later generalized the Luria-Delbruck distribution 
to the differential growth case. Since then, fluctuation analysis with 
differential growth rates has been advocated by several authors [9- 
13]. As shown in [14], Luria-Delbrtick distributions are made of 
three ingredients: 

1 . The mean number of mutations a, which is the parameter of main 
interest. It is the product of the individual probability of mutation 
(also called mutation rate) by the final number of cells. As already 
remarked by Luria and Delbruck [1], the law of small numbers 
implies that the random number of mutations that occur during 
the experiment follows a Poisson distribution with expectation a. 

2. The relative fitness p of normal cells to mutants, i.e. the ratio of the 
exponential growth rate of normal cells to that of mutants. 
(Growth rate refers here to the constant speed at which the 
logarithm of a population of cells grows, not to the size increments 
of individual cells). The time scale does not influence final counts 
of mutant cells: it may be chosen so that the growth rate of 
mutants is 1, in which case p is the exponential growth rate of 
normal cells. Exponential growth implies that most random 
mutations occur rather close to the end of the experiment, and 
more precisely that the time during which a new mutant clone 
develops has negative exponential distribution with parameter p. 

3. The random number of cells M(t) in a mutant clone that develops for a 
finite time t: it depends crucially on the division times of mutants. 



In the classical Luria-Delbruck model, mutants are supposed to 
have exponentially distributed division times, which implies that 
M(t) follows the geometric distribution with parameter e - ' 
(choosing the time scale so that mutants have unit growth rate). 

The first two points can be considered as established facts: they 
are in accordance with experimental data, and grounded on well 
known probabilistic results. On the opposite, the hypothesis of 
exponentially distributed division times is a purely mathematical 
convenience and does not match experimental observations: as 
remarked as early as 1932 by Kelly and Rahn [15,16], actual 
division times data are unimodal and right-skewed rather than 
exponential: see [17]. The question investigated here is: which bias 
on the estimation of the parameters does the exponential 
distribution hypothesis induce, and how can it be reduced? 

The "mathematical convenience" can be challenged. Admit- 
tedly, the exponential distribution of division times is the first one 
under which a closed mathematical expression for the distribution 
of mutants was obtained. Notwithstanding, it will be shown that a 
joint estimation procedure for a. and p can be implemented 
whatever the distribution of division times. Moreover if the 
division times of mutants are supposed to be constant, estimation 
procedures are exactly as computationally effective as under the 
exponential hypothesis. Since the pioneering observations of Kelly 
and Rahn [15] progress in experimental settings, from microscopic 
observation of single-cell behavior to flow chambers and 
automated growth analyzers, has fueled many studies on division 
times and their distributions. Division time data have been fitted 
by several types of distributions: from Gamma and Log-beta [18], 
to Log-normal and reciprocal normal [19]: see John [20] and 
references therein. More recent references include [21—24]. There 
is no such object as "the" distribution of division times; firsdy 
because it would depend not only on the species, strain, 
experimental conditions, etc., secondly because many different 
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families of distributions can usually fit any given set of observed 
data. I have chosen three families (Gamma, Log-normal, Inverse 
Gaussian) and one data set: the historical observations of Kelly and 
Rahn on Bacterium aerogenes [15]. A maximum likelihood 
estimation of the parameters on the data led to one particular 
distribution in each family, that was rescaled to unit growth rate. 
The three distributions so obtained were considered as realistic 
and used as benchmarks for extensive Monte-Carlo studies. 
Samples of size 100 of generalized Luria-Delbriick distributions 
were repeatedly simulated for different values of a and p, and for 
each of the three realistic distributions. The main conclusion was 
that using the classical Luria-Delbriick distribution estimation 
procedure yields satisfactory results for the estimation of the mean 
number of mutations a but introduces a sizeable bias on the 
estimation of the relative fitness p. The estimation procedure that 
uses constant division times has a negligible bias and a much better 
precision on p. 

I have developed in R [25] a set of functions that output samples 
of generalized Luria-Delbriick distributions, compute estimates, 
confidence regions and p-values for hypothesis testing. These 
functions have been made available on line: http:/ /www.ljk.imag. 
fr/ membres/Bernard.Ycart/LD/. 

Results 

Simulation experiments 

I denote hereafter by GLD(a.,p,F) the generalized Luria- 
Delbriick distribution with parameters a, p, and F. it is the 
distribution of the final number of mutants in a fluctuation analysis 
experiment, when the mean number of mutations is a, the relative 
fitness of normal cells compared to mutants is p, and the 
distribution of mutant division times is F. The particular case 
where division times are exponentially distributed is the classical 
Luria-Delbriick distribution LD(cc,p). Detailed definitions will be 
given in the 'models and methods' section. In real fluctuation 
analysis experiments, the actual distribution F of division times is 
unknown. Therefore the question to be answered was the 
following: if a sample of the generalized Luria-Delbriick distribu- 
tion GLD(oc,p,F) has been produced, and estimates a and p are 
computed from another division time model than F, by how much 
are these estimates biased, how reliable confidence intervals on a 
and p can be? 

Three distributions were used in simulation procedures: 
Gamma, Log-normal, and Inverse Gaussian; they were adjusted 
on Kelly and Rahn's Bacterium aerogenes data. The exact 
definition of the three distributions is detailed in the 'models and 
methods' section. Two models were considered for estimation: the 
exponential model (division times follow the negative exponential 
distribution, i.e. the classical model), and the Dirac model (all 
division times are equal to the same value). The corresponding 
distribution functions are denoted by -F exp and Ffo. The 
estimation procedure is explained in the 'models and methods' 
section. Figure 1 represents the evolution of three typical clones, 
simulated with the Dirac model, the Log-normal model, and the 
exponential model: the exponential model is much more irregular 
than observed in practice: see e.g. Figure 5 in [26] . 

The simulation study consisted in simulating samples of the 
GLD{ai,p,F), i^being a Gamma, Log-normal, or Inverse Gaussian 
distribution, then estimating a and p as if Fhad been F exp or i^ir- 
A simulation function for the GLD(a,p,F) has been included in 
the R script made available on line. It was used to output 10000 
samples of size 100 for 27 different sets of parameters: 0£= 1,4,8, 
p = 0.8, 1.0, 1.2, F being one of the three distributions mentioned 
above. Apart from the extensive study of [27], usual fluctuation 



experiment samples have size of order a few tens, which motivated 
my choice for the sample size. The range of values for p is typical 
of practical situations. For a, very small values were not considered 
as significant: if tz<l, a large part of the information is contained 
in the frequency of zeros: the so called po-method gives almost as 
good results on a as any other estimator, independently from the 
model [14]. 

For each of the 270000 samples, and for the two models F exp 
and i^ir, the estimates of ot and p were computed, together with 
their confidence intervals at level 95%. The results obtained with 
the three distributions Gamma, Log-normal, and Inverse Gauss- 
ian, turned out to be very similar. Only the results for Log-normal 
division times are reported here. Figure 2 displays the boxplots of 
the estimated values of ot and p for the 9 couples of parameters 
a= 1,4,8, p = 0.8, 1.0, 1.2. The following visual observations can be 
made: 

• the classical exponential model clearly overestimates p and has 
a rather large dispersion of estimated values, 

• the Dirac model correctly estimates p. It induces a much 
smaller bias and dispersion, 

• both models correctly estimate a. 

Further precisions are given in Table 1, were mean biases on 
1 0000 samples are given for each of the 9 couples of parameters 
(c/.,p) and the two models. The mean bias for estimates of p using 
the exponential model (last column of Table 1) is quite sizeable: 
between 10% and 30% of the true value. 

The quality of confidence intervals when the model is not 
adapted is illustrated on Table 2. For each of the 27000 samples 
of size 100, confidence intervals for a and p at confidence level 
95% have been computed using the exponential and Dirac 
model. Out of them, a theoretical proportion of 0.95 should 
contain the true value of the estimated parameter. The 
proportion of the 10000 intervals containing the true value 
has been computed for each value of the parameters. Table 2 
shows the results for the Log-normal samples (results for the 
other two distributions are similar). The confidence intervals for 
a. had a correct proportion of success for both models, slightly 
better for estimates using the exponential model. Confidence 
intervals on p using the Dirac model are also correct. However, 
the estimation of p using the exponential model was not reliable: 
up to 30% of the 95% confidence intervals did not contain the 
true value of p (last column of Table 2). This result is in 
accordance with the strong bias discussed above. 

The parameter of main interest being a, the results of Tables 1 
and 2 are encouraging: the bias on oc and the coverage probability 
of confidence intervals remain good, whichever model is used for 
estimation. In order to confirm this and evaluate the bias on a for 
larger values, another simulation experiment was made. For each 
of the two extreme models exponential and Dirac, for 
a=l,2, ...,10 then a= 10,20, ... ,100 and p = 0.8, 1.0,1.2, 
1 0000 samples of size 1 00 were simulated, and the estimate of a 
calculated with the other model. It can be considered that the 
biases so obtained are an upper bound for the biases induced by 
using any of the two extreme cases for an unknown division time 
distribution. The relative bias was calculated as the difference 
between the mean estimate and the true value of a, divided by the 
true value of a. The results are plotted on Figure 3. For a<3, the 
bias is virtually negligible. For a>4, estimating as if division times 
were constant (red points) induces a positive bias, estimating as if 
they were exponential (green points) induces a negative bias. The 
relative bias remains smaller than 5% for cc< 10. Notice that in all 
cases, for any given value of a the bias increases with p. 
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Log-normal 



Exponential 






Figure 1. Clones under of Dirac, Log-normal, and Exponential models. The Log-normal distribution has been adjusted on Kelly and Rahn's 
data. All three distribution have been scaled to have unit growth rates. Clones were simulated up to time 5. 
doi:1 0.1 371 /journal.pone.0080958.g001 
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Figure 2. Boxplots of estimates of a and />, using the exponential and the Dirac models. Red horizontal lines mark true values of the 
parameters. For each of the 9 sets of parameters ot= 1,4,8 (rows) and p = 0. 8, 1.0,1.2 (columns), 10000 samples of size 100 of the GLD(ct,p,F) were 
simulated, F being the Log-normal distribution adjusted on Kelly and Rahn's data. The estimates of a. and p were calculated with the two models 
Dirac and exponential. Each boxplot represents the distribution of the 10000 estimates obtained by the Dirac model (left) and the exponential model 
(right). 

doi:1 0.1 371 /journal.pone.0080958.g002 
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Table 1. Mean biases on estimates of alpha and rho. 



parameters 






Pdii 


Pesp 


a= 1, p = 0.8 


0.011 


0.016 


0.003 


0.083 


a = 1, p = 1.0 


0.008 


0.011 


0.002 


0.163 


a = 1, p = 1.2 


0.010 


0.010 


0.003 


0.278 


a = 4, p = 0.8 


0.061 


0.064 


0.000 


0.073 


a = 4, p = 1 .0 


0.061 


0.041 


-0.001 


0.137 


a = 4, p = 1 .2 


0.067 


0.033 


-0.001 


0.235 


a = 8, p = 0.8 


0.166 


0.028 


0.002 


0.056 


a = 8, p = 1 .0 


0.142 


-0.059 


-0.001 


0.107 


a = 8, p = 1 .2 


0.156 


-0.079 


0.000 


0.188 



For each of the 9 sets of parameters (left column), 1 0000 samples of size 1 00 of 
the GLD(ct.,p,F) were simulated, F being the Log-normal distribution adjusted 
on Kelly and Rahn's data. The estimates of a and p were calculated with the two 
models Dirac and exponential. The estimated bias is the mean difference 
between the estimate and the true value. Biases on p with the classical model 
(rightmost column} are of order 10% to 20%. 
doi:1 0.1 371 /joumal.pone.0080958.t001 

Having good estimates of the two parameters does not 
necessarily assure goodness-of-fit. In another experiment, 10000 
samples of the GLD(S,l.2,F) were drawn, F being the 'realistic' 
log-normal distribution. Each sample was adjusted both by the 
Dirac and exponential models: a and p were estimated for each 
model and the goodness-of-fit of the sample with the two adjusted 
distributions was tested, using the discrete version of the 
Kolmogorov-Smirnov test implemented in the R package dgof 
[28]. The test detected the difference in about 40% of the case 
(39% of p-values below 0.05 for the Dirac model, 43% for the 
exponential model). However, it must be observed that since the 
data were used to calculate the adjusted distribution, the p-values 
cannot be interpreted as if the distribution was independent from 
the data. More significantly, the comparison of Kolmogorov- 
Smirnov distances showed that the Dirac model was a better 
adjustment in 67% of the cases. This is coherent with the results of 
Table 1. 

Published data sets 

The simulation study of the previous section indicates that the 
estimates of a should be coherent whatever the model, whereas the 
exponential model overestimates p. In order to evaluate the 
difference in actual experiments, five sets of published data were 
used. Luria and Delbriick [1] (Table 2, p. 504) had data under 
three different experimental conditions. I have grouped in sample 
A experiments numbers 1 , 10, 11 and 21b; in sample B 
experiments 1 6 and 1 7 . Data published in Boe et al. [2 7] , Rosche 
and Foster [29], and Zheng [12] were also used. For each data set 
the 95% confidence intervals on a and p were computed using the 
exponential and the Dirac model. Results are reported in Table 3. 
The data set from [29] has a high frequency of zeros, and no 
jackpot; this explains why p cannot be reliably estimated by the 
exponential model. The Dirac model gives a more realistic 
estimate. In all cases, confidence intervals for a are similar. 
Confidence intervals on p are different, even though they overlap. 
As an example, for the Boe et al. data [27], the estimate of p given 
by the Dirac model is 0.738; the estimate given by the exponential 
model is 0.824, i. e. 1 1.6% larger. That difference is coherent with 
what has been observed on simulated data. Also, the amplitudes of 
the confidence interval under the Dirac and exponential models 
are 0. 1 34 and 0.172: the precision under the Dirac model is better. 



Fluctuation Analysis: Can Estimates Be Trusted? 



Table 2. Proportion of success for 95% confidence intervals. 



parameters 


Sdit 


^L'.\p 


Pin 


Pexp 


a=1, p = 0.8 


0.950 


0.950 


0.951 


0.960 


« = 1, p=1.0 


0.951 


0.952 


0.946 


0.954 


01=1, p=1.2 


0.954 


0.953 


0.950 


0.957 


a = 4, p = 0.8 


0.945 


0.947 


0.949 


0.896 


o: = 4, p= 1.0 


0.947 


0.950 


0.947 


0.845 


!X = 4, p= 1.2 


0.949 


0.952 


0.946 


0.787 


a = 8, p = 0.8 


0.948 


0.956 


0.950 


0.878 


a = 8, p= 1.0 


0.948 


0.953 


0.952 


0.805 


«=8, p=1.2 


0.944 


0.948 


0.949 


0.695 



For each of the 9 sets of parameters (left column), 1 0000 samples of size 1 00 of 
the GLD(c/.,p,F) were simulated, F being the Log-normal distribution adjusted 
on Kelly and Rahn's data. The 95% confidence intervals for a and p were 
calculated with the two models Dirac and exponential. The entries of the table 
are proportions of the 10000 samples for which the true value is in the 
confidence interval. A result close to 0.95 indicates a satisfactory estimation. 
doi:1 0.1 371 /joumal.pone.0080958.t002 

The goodness-of-fit was tested for the two models, using the 
discrete version of the Kolmogorov-Smirnov test [28]. The results, 
reported in Table 4, are not conclusive: both adjustements are 
good in all cases. The Dirac model is (slightly) better for three 
datasets out of five. 

Discussion 

Dealing with fluctuation analysis experiments and the calcula- 
tion of mutation rates, three different levels must be distinguished: 
the reality which remains unknown, the mathematical model, and 
the estimation technique. 

The unknown reality 

Mutant counts at the end of a fluctuation analysis experiment 
are the result of 

1 . a random number of mutations occurring with small 
probability among a large number of cell divisions, 

2. the random times during which mutant clones stemming from 
each mutation develop, 

3. the number of cells that a clone developing for a given time 
may produce. 

The mathematical model 

All models can be interpreted according to the same three 
points. The first two are hardly disputable; the third one is much 
more controversial. 

1 . Due to the law of small numbers, the number of mutations 
must follow a Poisson distribution with expectation a, 
understood as the mean number of mutations occurring during 
the experiment, i.e. the product of the individual probability of 
mutation (also called mutation rate) by the final number of 
cells. 

2. The developing time of a random clone has exponential 
distribution with parameter p, provided the time scale has been 
chosen so that the growth rate of mutants is 1 : p is the ratio of 
the growth rate of normal cells to that of mutants, or else the 
relative fitness. 
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relative bias on a from 1 to 10 



relative bias on a from 10 to 100 





Figure 3. Relative bias on a between the exponential and the Dirac models. Ten thousand samples of size 100 were simulated for the 
LD(a,p) for alpha between 1 and 10 (left panel), then between 10 and 100 (right panel) and p = 0. 8, 1.0,1.2. The estimate of a was computed using 
the GLD(a,p,F i [ I ), then averaged over all samples. The relative bias was calculated as the difference between the mean estimate and the true value 
of ct, divided by the true value of a. Results are plotted as red points. The results for the opposite experiment (i.e. simulating the GLD{a.,p,F^), and 
estimating using the LD(y.,p)) are plotted as green points. 
doi:1 0.1 371 /journal.pone.0080958.g003 



3. The distribution of the number of cells that a clone developing 
for a given time can produce depends on various modelling 
hypotheses, such as: 

• if a mutation occurs during a division, only one of the two 
daughter cells is a mutant 

• mutant clones develop forever as mutants (no back mutation) 

• no cell dies before dividing 

• the division times are independent and identically distributed 

• the distribution of division times is exponential 

Since the early forties (and maybe even before: see [30]), 
mathematicians have struggled to propose sets of modelling 
hypotheses that allowed explicit computations of probabilistic 
distributions. Following Lea and Coulson [5], Bardett [6]), and 



Table 3. Confidence intervals for published data sets. 





reference 


size 


«dir 


*exp 




Pexp 


Luria & 

Delbruck A [1] 


42 


[5.30; 9.09] 


[5.22:8.89] 


[0.77; 1.19] 


[0.82; 1.34] 


Luria & 

Delbruck B [1] 


32 


[0.34; 1.03] 


[0.35; 1.04] 


[0.21; 0.76] 


[0.18; 0.80] 


Boe et al. [27] 


1102 


[0.64; 0.77] 


[0.65; 0.77] 


[0.69; 0.83] 


[0.73; 0.91] 


Roshe & Foster 
[29] 


52 


[1.03; 1.98] 


[1.03; 1.98] 


[1.02; 4.25] 


[0; 12.12] 


Zheng [12] 


30 


[6.79; 13.21] [6.66; 12.78] 


[0.64; 1.02] 


[0.67; 1.11] 



For 5 published data sets, the 95% confidence intervals on a and p were 
calculated with the two models Dirac and exponential. 
doi:1 0.1 371 /journal.pone.0080958.t003 



Haldane [30,31], the first four of the above hypotheses have been 
widely agreed upon. As for the distribution of division times, the 
exponential model that leads to the classical Luria-Delbriick 
distribution has largely prevailed [32,33], though constant division 
times have also been considered [30,31]. At first, only the case 
were normal cells and mutants had the same growth rate (p — 1) 
was studied. But soon, with Mandelbrot [7] and Bartlett [8], the 
model was generalized to differential growth rates [9-13]. 
Strangely enough, whereas the Poisson approximation (point 1. 
above) has been considered an obvious fact since Luria and 
Delbruck [1], the exponential distribution of development times 
(point 2.) has remained unnoticed, even though it was known as a 
basic fact of branching process theory at least since the sixties [34] . 



Table 4. Kolmogorov-Smirnov goodness-of-fit tests for 
published data sets. 







Dirac model 


Exponential model 


reference 


distance 


p-value 


distance 


p-value 


Luria & Delbruck A [1] 


0.055 


1.000 


0.057 


0.999 


Luria & Delbruck B [1] 


0.069 


0.998 


0.055 


1.000 


Boe et al. [27] 


0.015 


0.955 


0.006 


1.000 


Roshe & Foster [29] 


0.046 


1.000 


0.049 


1.000 


Zheng [12] 


0.063 


1.000 


0.070 


0.997 



The Kolmogorov-Smirnov distance between the sample and the adjusted 
distribution was calculated for the two models Dirac and exponential. The 
parameters of the adjusted models were estimated from the data by the GF 
method. Since the adjusted model used estimations from the data, the p-value 
can only be taken as an indication. Calculations were made using the R package 
dgof [28]. 

doi:1 0.1 371 /journal.pone.0080958.t004 
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It was remarked in [14], and leads not only to a much simpler 
derivation of closed mathematical formulas, but also to simple and 
efficient simulation algorithms. 

A distinctive hypothesis of the model considered here (as in most 
previous works), is that cells can only divide and never die. A 
model taking cell deaths into account was described in [35], and 
an estimation procedure was proposed. In practice, the proportion 
of deaths is known to be rather low [22,36]. As shown in [35], 
neglecting cell deaths underestimates a and p. Another dubious 
hypothesis of the models considered so far is the independence of 
individual division times. The independence hypothesis was 
questioned very early [37]. Indeed, actual division time data show 
two types of correlation [38]: between the division times of a 
mother cell and its two daughters, and between the two sisters 
conditioning on the mother. It was remarked long ago by Powell 
[39] (see also [40,41]) that sister-correlations do not influence 
exponential growth. The effect of mother-correlation on growth 
rates was discussed by Harvey in [41]. Its influence on the 
estimation of parameters in fluctuation analysis will be the object 
of future work. 

The estimation technique 

From Luria and Delbrtick [1], the mean number of mutations a 
has been the parameter of interest, whereas the relative fitness p 
was regarded at best as a nuisance parameter, or very often taken 
as fixed: p = 1 [2-4] . Indeed, the relative fitness can be 
independendy estimated, by separately growing clones of mutants 
and normal cells, and calculating their growth rates [42] . If this 
has been done, then p can be considered as known, which leads to 
a better estimation of a, as pointed out in [14]. Yet p is rarely 
known in practice. Its independent calculation may be difficult in 
some cases (in vivo experiments for instance). Considering 
differential growth rates is necessary, as pointed out by several 
authors [9-14]; however, many studies are still being made using 
the LD((X,1) without questioning the equal rate hypothesis (e.g. 
[43,44]). 

Once a mathematical model has been chosen, many estimation 
procedures for a. and/or p are available [4,14]. As in any 
parametric estimation problem, the questions are: 

• are estimates unbiased? 

• can confidence intervals be computed? 

• is the mean squared error minimal? 

Only three methods answer positively the first two questions: the 
77o-mefhod [1,4], the Maximum Likelihood (ML) method 
[12,13,45,46], and the Generating Function (GF) method [14]. 
As in many other estimation problems, the best method in terms of 
mean squared error is the ML method. As was shown in [14], the 
p^-raeXhoA performs well for small values of a.. The GF method is 
nearly as precise as the ML, with a much broader range of 
applicability, and virtually null computing time. 

To go further, three more criteria must be added: 

• to how many models can the procedure be applied? 

• can it work on a wide enough range of values of a and p? 

• is it robust to variations of modelling hypotheses, or else how 
much bias estimating with a wrong model does induce? 

As far as the first and third questions are concerned, the winner 
is the po -method: the distribution of the estimator is easily 
computed under any model, and the result does not depend on 
any hypothesis, except the fact that cells always divide and never 
die (see [35] for an alternative in the case of cell deaths). However, 



it relies upon a positive number of zeros in the sample, and is 
therefore limited to relatively small values of a (smaller than 2 in 
practice). Such a limitation is not statistically acceptable. 

Regarding the first question, the ML procedure can be applied 
if the probabilities of mutant counts can be computed as a function 
of the parameters. This is the case for only two distributions so far: 
the classical LD(a,p) (independent exponential division times, no 
deaths), and the GLD(a,p,Fiu) (constant division times, no 
deaths). The GF procedure can be applied to any GLD(a,p,F), 
provided the distribution F has been previously estimated. It was 
applied to a cell-death model in [35]. Actually the Monte-Carlo 
algorithm proposed here can be used for any model, as soon as 
clones can be simulated. If the distribution of division times is 
unknown, any one of the two models above (exponential or 
constant division times) can be chosen. 

The second question has been discussed in [14]. Even with a 
very careful algorithmic implementation [12,47], the ML method 
can compute estimates only for samples in which the maximal 
value does not exceed a certain limit. Yet a crucial feature of 
mutant counts is the appearance of jackpots, i.e. unusually large 
values. For the ML method to be applied, the highest jackpots 
must be levelled out, which induces a systematic bias both on a 
and p. This explains why the ML method can be used only when 
large jackpots are very unlikely, or else if a. is small enough and p 
large enough; as an indicative range of values, <x< 10 and p>0.5 
can be considered (admittedly, current experiments stay within 
that range). 

Regarding the third question, estimating parameters with a 
wrong model can be expected to induce some bias, whichever 
estimation method is used. As shown in [14], the GF and ML 
methods output very similar results (when both can be used). So 
the conclusions of the 'Results' section would hold as well for ML 
estimates. The main question was to evaluate which bias could be 
expected from using either the Dirac or the classical exponential 
model, when data were simulated using a more realistic model. 
The estimation of a can be expected to be robust for low values, 
because when a is small, the information is concentrated on the 
first value po = e~" that depends only on a.. The surprise was that 
it is still robust up to a = 10, when po is very small and the po- 
method cannot be used (Figure 3, left panel). For very large values 
of a, both models induce a bias on a, positive for the Dirac model, 
negative for the exponential model (Figure 3, right panel). The 
estimation of p is more sensitive to the model: estimating with the 
exponential model induces a positive bias; using the Dirac model 
reduces the bias (Figure 2 and Table 1). 

Models and Methods 

Division time distributions and growth rates 

In this section, the probabilistic model of cell division and 
mutations is described, the relation between division times and 
growth rates is precised, and the goodness-of-fit of Kelly and 
Rahn's data [15] with three families of distributions is detailed. 

In Kendall's notation [48], the model considered here is G/G/ 

0: 

• at time 0 a homogeneous culture of n normal cells is given; 

• the division time of any normal cell is a random variable with 
distribution function G; 

• when the division of a normal cell occurs, it is replaced by: 

- one normal and one mutant cell with probability 

- two normal cells with probability 1 — p; 
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• the division time of any mutant cell is a random variable with 
distribution function F; 

• when the division of a mutant cell occurs, it is replaced by two 
mutant cells; 

• all random variables and events (division times and mutations) 
are mutually independent. 

The probabilistic results used here come from the theory of 
continuous time branching processes: see [49,50]. To a distribu- 
tion of division times corresponds an exponential growth rate for 
the corresponding clones: the growth rate of a clone with binary 
divisions is the point at which the Laplace transform of division 
times equals 1 / 2 . If all division times are multiplied by a constant, 
the growth rate is divided by the same constant. Therefore scaling 
a distribution to have unit growth rate amounts to multiplying all 
division times by the initial growth rate. Here I assume that the 
time scale has been chosen so that the growth rate of normal cells 
is the relative fitness p, and the growth rate of mutants is 1: 



e-'"dG(t)-- 



'dF(t)-- 



Two particular cases will be seen as extreme values for the 
distribution F: exponential and Dirac distributions. 



F exp (t)=l-e- 



F<£r(t) = H[log(2), + oo)(0. 



where H denotes the indicator function of an interval (1 or 0 
according to whether the variable is in the interval or not). These 
distributions have coefficients of variation equal to 1 and 0 
respectively. The coefficients of variation observed in experiments 
are of order 0.2 [23]. I have chosen three families of distributions 
to illustrate my results: Gamma, Log-normal and Inverse 
Gaussian. All three have the property to be invariant through 
scaling. For instance, if X has Gamma GA(a,X) distribution, then 
sX has GA(a,X/s) distribution; similar relations hold for the two 
other families. The probability distribution functions, Laplace 
transforms, and scaling parameters are given in Table 5. As many 
other families of distributions, these three encompass the two 
extremes of exponential and Dirac distributions as limit cases and 
interpolate between them. This is illustrated by Figure 4 where 20 
densities of unit growth rate distributions are plotted for each 
family. 

In order to get one realistic distribution per family, the historical 
observations of Kelly and Rahn on Bacterium aerogenes: Table 2 
p. 149 of [15] were adjusted. A maximum likelihood estimation of 
the parameters on the data led to one particular distribution in 
each family, that was rescaled to unit growth rate. Figure 5 
illustrates the fit. On the left panel, the histogram and the 3 
densities are superposed; the right panel displays the correspond- 
ing densities after scaling to unit growth rate. Table 6 gives the 
parameters of the three densities, together with the p-values of the 
Kolmogorov-Smirnov and Anderson-Darling goodness-of-fit tests: 
all three fits turn out to be satisfactory. 

Generalized Luria-Delbruck distributions 

Consider an initial (large) number n of normal cells. Assume that 
the mutation probability p is small, that the time t at which 
mutants are counted is large, and that the asymptotics are such 
that the expected number of mutations a before time t is non null 
and finite. Using general results of branching process theory [49] 
and [50], it can be proved that the total number of mutants at time 
i approximately follows an integer valued distribution, whose 



probability generating function (PGF) is given by: 

g*, P {z) = exp(a(/j p (z) - 1)), (1) 



with: 



*,(*) = 



il/(z,t)pe-f"dt, 



(2) 



where ij/(,z,f) is the PGF of M(t), i.e. the number of cells at time t 
in a mutant clone, starting from one single cell at time 0. 



\j/(z,t) = E[z 



M(t)] 



(3) 



where E denotes mathematical expectation. The explicit expres- 
sions (1) and (2) are quite general, and do not depend on any 
modelling assumption apart from exponential proliferation. If the 
individual division times of mutants are supposed to be indepen- 
dent with common distribution F, then the function >J/(z,t) is 
uniquely defined in terms of F. 

The interpretation of (1) and (2) is quite simple, and can be 
separated into the following two arguments. 

• The number of mutations converges in distribution to the 
Poisson distribution with parameter a (this remark had already 
been made by Luria and Delbriick [1]). From each mutation 
stems a mutant clone that develops at final time T into a 
random number of mutants, each with PGF h p . A random 
number of such clones must be added: the result is a Poisson 
sum of independent random variables with PGF h p . This yields 
equation (1). 

• Any given mutation happens at some division instant chosen at 
random (i.e. uniformly distributed) among all division instants. 
Due to exponential growth, division instants are more 
concentrated near the end of the observation interval. It can 
be proved that the difference between the final time and a 
randomly chosen division instant, i.e. the developing time of a 
typical mutant clone, is exponentially distributed with 
parameter p. Therefore the size at final time of a typical 
mutant clone is an exponential mixture of sizes at fixed time t. 
Hence equation (2). 

Precise mathematical statements and proofs of the asymptotics 
described above have been given in [14], and will not be 
reproduced here. I propose to name Generalized Luria-Delbriick 
distribution with parameters a, p, and F and denote by 
GLD(a,p,F), the probability distribution on the set of integers 
whose PGF g a f> is defined by (1) and (2). Observe that it depends 
on the division time distribution of normal cells G only through the 
growth rate p, whereas it does depend on the actual division time 
distribution F of mutant cells. The particular case GLD(a,p,F exp ) 
is the classical Luria-Delbriick distribution LD(a,p). In that case, 



^(0=1- 



and 



h e ; p (z)= 



-pe-r'dt. 



(4) 



The exponential case has been known for a long time: see Zheng 
[32,33] for historical accounts. As shown in [14], formula (4) 
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Gamma Log-normal Inverse Gaussian 




Figure 4. Densities of Gamma, Log-normal, and Inverse Gaussian. All densities have been rescaled to have unit growth rates. The dashed 
curve is the density of the exponential distribution with rate 1. The dashed vertical line locates the Dirac distribution at log 2. 
doi:1 0.1 371 /journal.pone.0080958.g004 



comes from the fact that the size of a mutant clone at time t follows 
the geometric distribution with parameter e — a fact already 
pointed out by Yule [51] (see also [50], p. 109). It turns out that 
explicit expressions of \^{z,s), h p (z), and g^ p can also be given in 
the case where division times are constant, which is the object of 
the next section. 

Constant division times 

Here it is assumed that division times of mutants are constant, 
i.e. -Fis the Dirac distribution at log(2) (to ensure unit growth rate). 

F, dir ( t) = A [| og (2), + oo ) (0 • 

Thus the generalized Luria-Delbruck distribution GLD(a.,p,F^ T ) is 
considered. The idea can be traced back to Haldane who used it to 
propose an approximation heuristics for calculating the probabilities 
of mutant counts: see Sarkar [30] and Zheng [31] (actually, 
Haldane's model can be related to the particular case GLD(x, 1 ,.Fd; r ). 



For the GLD(o£,p,i*dir), formula (2) becomes: 

hf r (z) = (l-2-'')J2 2 ~" Pz2 "- ( 5 ) 

n = Q 

To the best of my knowledge (5) is new. Here is how it is derived. 
With constant division times, say all equal to a, the population 
doubles at multiples of a. Hence the exponential growth rate is 
log(2)/a=l, therefore fl = log(2). Between instants na and 
(w+l)a, there are 2" cells in the clone. Hence the generating 
function at time ,s: 

+ co 

^ dir (z^)=5]z 2 "n K(n+1)fl) (. 5 ). 

H = 0 

Integrating against the exponential distribution with parameter p 
gives: 



Table 5. Characteristics of three families of distribution. 



Distribution 



Gamma 



Log-normal 



Inverse Gaussian 



parameters 
PDF 

Laplace transform 
growth rate 
unit growth rate 



GA(a,X) 
f-'l" 



;.(2'/°-i) 

Gy4(a,l/(2'/"-l)) 



log(t)-n) 1 
la 1 



LN(fi,a) 
1 

— e 

x\/2no 2 

numeric 
g.r. numeric 
ZJV(/i + log(g.r.),<T) 



IGM 



1/2 _ 



2j?t 



27rt 3 . 



12 | log z 2 



/ G (l 0 g2 + ^.^ + l2^) 
2A /( 2 



For Gamma, Log-normal and Inverse Gaussian distributions, the notation of parameters, the probability distribution function (PDF), the Laplace transform, the growth 
rate, and the scaling for unit growth rate are given. 
doi:1 0.1 371 /journal.pone.0080958.t005 
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Bacterium aerogenes unit growth rates 




10 20 30 40 50 60 70 0.0 0.5 1.0 1.5 2.0 



Figure 5. Adjusted distributions for Kelly and Rahn's data on Bacterium aerogenes [15]. On the left panel, the histogram of the data, and 
the three densities are superposed; the Gamma distribution appears in red, the Log-normal distribution in blue, the Inverse Gaussian in green. The 
blue and green curves are very close. On the right panel, the densities have been rescaled to unit growth rate. The dashed curve is the density of the 
exponential distribution, the dashed vertical line locates the Dirac distribution at log 2. 
doi:1 0.1 371 /journal.pone.0080958.g005 



+ oo 
n = Q 

hence (5), since e a =2. 

Not only the PGF, but also the probabilities of the 
GLD(!x,p,F& T ) can be easily computed. Indeed, let {pk)k>\ denote 
the probabilities of the distribution with PGF h^: 

p k = (\-2->>)2- n >> if k = 2", Oelse. 

Let (qk)k>o be the probabilities of the GLD(n,p,F^ T ). They can be 
computed by the following well known recursive formula, easily 
deduced from the probability generating function (1) (see [52] and 
references therein): 

a k 

qo = e * and for k> 1, qk = ^y'jpiqk-i- (6) 
K i=i 

The algorithm has been encoded in the R script available online: 



Table 6. Adjusted distributions for Kelly and Rahn's data on 
Bacterium aerogenes [15]. 







Distribution 


Gamma 


Log-normal 


Inverse 
Gaussian 


parameters 


0 = 11.18, i= 15.64 


fi= -0.38, (j = 0.30 


;i = 0.72, 1 = 1.50 


Kolmogorov- 
Smirnov 


0.693 


0.919 


0.874 


Anderson- 
Darling 


0.505 


0.852 


0.813 



A maximum likelihood estimation of the parameters on the data led to one 
particular distribution in each family, that was rescaled to unit growth rate. The 
parameters of the rescaled distribution are given, together with the p-values for 
the Kolmogorov-Smirnov and Anderson-Darling goodness-of-fit tests. 
doi:1 0.1 371 /journal.pone.0080958.t006 



the probabilities, cumulated distribution function and quantile 
function of the GLD(<x,p,F$ u ) are provided. The log-likelihood 
and its derivatives with respect to the parameters also have explicit 
algorithms, almost identical to those implemented for the LD(a,p) 
by Zheng [13]. The conclusion is that the estimation of a. and p 
can be conducted for the GLD(n,p,F^) exactly as for the 
LD(oc,p), either by the classical Maximum Likelihood method [13] 
or by the generating function method [14]. The algorithms are 
even faster and numerically more stable in the constant division 
time model. 

General division times 

No distribution F other than F exp and F^ leads to such closed 
expressions as (4) and (5). However, it is possible to compute 
numerically h p (z) for any F, using a Monte-Carlo algorithm that 
will now be described. If a division time distribution is given, 
sequences of independent division times can be simulated at will. 
From such a sequence, a clone can be simulated up to any 
arbitrary time, outputing the number of cells as a function of time. 
That function of time is encoded by the sequence of instants at 
which the function increases by 1, i.e. when divisions occur. 
Choose a value p m ; n , such that any subsequent evaluation ofh p (z) 
will be made for values of p larger than p m j„. In simulations, I have 
chosen p mia = 0.8, but this value could be adjusted. Let 7\, . . . , Ji- 
be k independent instants, simulated according to the exponential 
distribution with parameter p m ; n . A crucial observation is that if 7), 
is exponentially distributed with parameter p^, then for any 
p > Pmin> Pj f L Th Th is exponentially distributed with parameter 
p. For h = 1, . . . ,k, denote by Nf,(t) the number of living cells at 
time t in a random clone, starting from a single mutant cell at time 
0, simulated up to time T/,. For any p^p^, and any ze[0,l], 
consider: 

k h=\ 

By the law of large numbers, as k tends to infinity, h p (z) converges 
to h p (z). The central limit theorem yields a precision of order 



PLOS ONE | www.plosone.org 



9 



December 2013 | Volume 8 | Issue 12 | e80958 



Fluctuation Analysis: Can Estimates Be Trusted? 



(sum of two independent exponentially distributed random 
variables). Therefore h p (z) is the PGF of the number of cells in 
a clone starting from a single mutant cell at time 0, observed up to 
an independent, Gamma distributed random time. Let T\, . . . , Ji- 
be k independent instants, simulated according to the Gamma 
distribution with parameters 2 and p^. For h = 1, . . . ,k, denote 
by N/,(t) the number of living cells at time t in a random clone, 
starting from a single mutant cell at time 0, simulated up to time 
T/,. For any p>p^ n , and any ze[0,l], consider: 

1 k . . 
h p (z) = - V z N ^ T hPmmlP\ 
k h=\ 

By the law of large numbers, as k tends to infinity, h p (z) converges 
to h p (z). 

Further savings in computer time can be obtained by the 
following remark. Let T follow the Gamma distribution with 
parameters 2 and p. Let U be another random variable, 
independent from T, uniformly distributed on the interval [0, 1] . 
Then ITT follows the exponential distribution with rate p. 
Therefore the same k clones, simulated up to Gamma 
distributed instants, can be used to estimate the values of both 
h p {z) and h p (z). 



1£3 



O 



If) 
CO 




0.8 0.9 1.0 1.1 1.2 



P 

Figure 6. Ratios for GF estimators of the relative fitness p. Ratios fz ixi (p) = j^fefet as functions of p, for z\ =0.1 and r 2 = 0.9. The ratios 
depend on the division time distribution: exponential (solid black), Dirac (dashed blaclc), Gamma (red), Log-normal (blue), Inverse Gaussian (green). 
The realistic distributions are close together, and closer to the Dirac case than to the exponential case. This explains why the classical Luria-Delbruck 
model induces a positive bias on the estimation of p, and why the Dirac model yields better results. 
doi:1 0.1 371 /journal.pone.0080958.g006 



1 / \fk on the result. In simulations (in particular to compute the 
curves of Figure 6), k has been fixed to 10 . Observe that the (time 
consuming) simulation of the k clones needs to be done only once: 
from there, all subsequent evaluations of h p (z) will be deduced. 

As will be seen in the next section, the estimation of a and p and 
the computation of their confidence intervals require repeated 
evaluations of the derivative in p of h p (z). Using the procedure 
above to evaluate that derivative by finite differences would lead to 
quite unprecise results. Another procedure, similar to the previous 
one, is proposed instead. The derivative in p of h p (z) is: 



Sh p (z) 
dp 



iJ/(z,s)e-P s ds-p 



ijf(z,s)se-'' s ds 



with: 



-h p (z)- -hJz), 
P P 



hJz)- 



\l/(z,s)p 2 se- l,s ds. 



Now p 2 se ps is the density of the Gamma distribution GA(2,p) 



l(p), z1=0.1, z2=0.9 
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Generating function estimators 

The main goal of fluctuation analysis is to estimate the mutation 
probability p, from a sample of mutant counts. If an estimate of the 
mean number of mutations a has been calculated, then an estimate 
of p can be deduced, dividing by the final number of cells: the 
parameter of main interest is <x. Many methods of estimation for a 
have been proposed: see [4]. The simplest consists in estimating 
the probability of observing no mutant: e~ a ; this is the original 
method used by Luria and Delbruck [1], and is usually referred to 
as "/>rj-method". Observe that the result does not depend on p, nor 
on F. Therefore the />o-method is completely independent from 
any modelling hypothesis. It that sense it is the most robust of all 
methods. However, the />o-method can be used only if a is small 
enough (so a sizeable number of tubes do not contain any mutant). 
As explained in [ 1 4] and in the discussion section, such a limitation 
cannot be accepted. 

Apart from the po-method, any other consistent estimator of a 
must depend on the value of p and on the mutant division time 
distribution F. Maximum Likelihood is usually considered the best 
estimation method in a parametric inference problem. For the 
estimation of the parameters a. and p of the classical LD(oc,p), it 
has been recommended by several authors: [12,13,45,46]. In [14], 
its limitations were pointed out, and an alternative procedure, 
based on the empirical probability generating function (EPGF), 
was proposed. It turns out that the EPGF method can be adapted 
to the general case of the GLD(cc,p,F), whereas the Maximum 
Likelihood cannot. It only relies upon the numerical evaluations of 
h p (z) and its derivative in p. For the LD(a,p) and GMD(a,p,Ffa) 
explicit formulas are available, for the other cases a Monte-Carlo 
algorithm was described in the previous section. The procedure is 
described below, and the reader is refered to the R functions that 
have been made available online for implementation details: they 
include estimation, confidence intervals, and hypothesis testing. 

Let {X\ , . . . ,X n ) be a sample of independent random variables, 
each with GLD(oi,p,F) distribution. Recall the probability 
generating function of the GLD(a,p,F): 



with: 



ij/(z,t)pe-i"dt. 



Define the empirical probability generating function (EPGF) g„(z) 
as: 



g-„(z)=-V: 

n — ' 



The random variables z x i are bounded and mutually independent: 
by the law of large numbers, g„{z) is a consistent estimator of 
ga,p(z), for any z in [0,1]. For 0<zi<z 2 <l, consider the 
following ratio: 



hpjzQ-l = l0gg a ,p(Z]) 

h p (z 2 )-l iogg^ p (z 2 )' 



(7) 



The function that maps p onto y=f 2l ,z 2 (p) is continuous and 
stricdy monotone, hence one-to-one. Therefore the inverse, that 
maps j onto P=f z ~l 2 (y), is well defined. For 0<zi<z 2 <l, let 
pn(z\ ,z 2 ) denote the following log-ratio. 



y n (zi,z 2 )-- 



logfe (zQ) 
log(£„(z 2 )) ' 



An estimator of p is obtained by: 

P„(zi,z 2 )=/ Z ~] 2 (j>„) 
Then an estimator of ot by: 



a„(zi,z 2 ,z 3 )= 



log(g„(z 3 )) 
h Mv2 )(z3)-l : 



where Z3e(0; 1) is a new control, possibly different from Z\ and z 2 . 
Observe that &„(zi,z 2 ,Z3) depends on p„(zi,z 2 ), whereas p„(zi,z 2 ) 
only depends on the arbitrary choice of the couple (zi,z 2 ). They 
will be referred to as generating function (GF) estimators. The 
strong consistence and asymptotic variance of the GF estimators 
were studied in [14], and mathematical details will not be 
reproduced here. In particular, Proposition 4. 1 of that reference 
gives the explicit form of the asymptotic covariance matrix, upon 
which inference procedures are based (confidence intervals and 
hypothesis testing). The asymptotic covariance matrix has been 
encoded in the R functions made available online; it expresses in 
terms of: 

• the PGF g^p evaluated at Z\, z 2 , Z3, and their products two by 
two, 

• the PGF h p evaluated at z\, z 2 , Z3, 

• the derivative in p of h p , evaluated at Z\, z 2 , Z3. 

The GF estimators depend on the three arbitrary values erf Zi, z 2 
and Z3. Another tuning parameter has to be added. In the 
GLD(a,p,F) the parameter p, determines the size and frequency 
of much larger values than usual (called "jackpots" in [1]). For 
p < 1 , some very large values can be obtained, even for a small a. 
Using the empirical probability generating function is a simple 
way to damp down jackpots, and get robust estimates. The 
variable z can be seen as a tuning parameter for the damping. At 
the limit case z = 0, g„(0) is simply the frequency of null values, 
and a„(0)= — log(g„(0)) is the so called po-estimator of a, already 
proposed in [1] (it does not depend on p nor F). For Zj =0.1, only 
small observations will be taken into account, whereas for z 2 =0.9, 
much larger values will influence the sum. Thus the empirical 
probability generating function damps down jackpots in a 
differential way according to z\ and z 2 . Choosing Zi=0.1 and 
z 2 = 0.9 will contrast small values compared to jackpots, which 
explains why p n can efficiently estimate p for small a's. However, 
for large values of a (say a. > 5), even z 2 = 0.9 will output very small 
values, below the machine precision. This will make the estimates 
numerically unstable. A natural way to stabilize them is to rescale 
the sample, dividing all values by a common factor b. This 
amounts to replacing z by z 1 /* in the definition of g„(z): 



z*il b = - ^ (z l > b fi = g„(z 1/A ). 



I proposed to set b to the q-th quantile of the sample, where q is 
another control. Based on simulation evidence, my best compro- 
mise is Zi=0.1, z 2 =0.9, Z3=0.8, (7 = 0.1. In the implementation 
of the GF estimators, the scaling factor b is set to the q-l\~i quantile 
of the sample, and all data are divided by that scaling factor (which 
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amounts to replacing z\ 9 Z2 9 z$ by z/ ,z 2 ,z 3 ). The estimators a n 
and p n are computed with these values. 

The GF estimators crucially rely upon the inverse of the 
function^ defined by (7). Figure 6 shows variations of/" according 
to the underlying model. On that figure, plots of / for the 
exponential and Dirac model have been represented, together with 
plots of/ for the 3 distributions determined by fitting actual data. 
The curves corresponding to realistic distributions are close 
together, and closer to the Dirac case than to the exponential 
case. From this graphics, it can be anticipated that estimating p 
with the classical Luria-Delbriick model induces a positive bias; 
this was indeed observed on simulations. Also the fact that the 
slope of the curve corresponding to the exponential model is 
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